中国邮电高校学报(英文) ›› 2010, Vol. 17 ›› Issue (4): 100-109.doi: 10.1016/S1005-8885(09)60495-7

• Others • 上一篇    下一篇

Fuzzy Q-learning in continuous state and action space

徐明亮   

  1. Department of Electronic Information Engineering, Wuxi City College of Vocational Technology, Wuxi 214063, China
  • 收稿日期:2009-11-27 修回日期:2010-06-23 出版日期:2010-08-30 发布日期:2010-08-31
  • 通讯作者: 徐明亮 E-mail:xml1973@126.com
  • 基金资助:

    This work was supported by the National Natural Science Foundation of China (60703106).

Fuzzy Q-learning in continuous state and action space

XU Ming-liang, XU Wen-bo   

  1. Department of Electronic Information Engineering, Wuxi City College of Vocational Technology, Wuxi 214063, China
  • Received:2009-11-27 Revised:2010-06-23 Online:2010-08-30 Published:2010-08-31
  • Contact: XU Ming-Liang E-mail:xml1973@126.com
  • Supported by:

    This work was supported by the National Natural Science Foundation of China (60703106).

摘要:

An adaptive fuzzy Q-learning (AFQL) based on fuzzy inference systems (FIS) is proposed. The FIS realized by a normalized radial basis function (NRBF) neural network is used to approach Q-value function, whose input is composed of state and action. The rules of FIS are created incrementally according to the novelty of each element of the pair of state-action. Moreover the premise part and consequent part of the FIS are updated using extended Kalman filter (EKF). The action that impacts on environment is the one with maximum output of FIS in the current state and generated through optimization method. Simulation results in the wall-following task of mobile robots and the inverted pendulum balancing problem demonstrate that the superiority and applicability of the proposed AFQL method.

关键词:

Q-learning, FIS, continuous, adaptation

Abstract:

An adaptive fuzzy Q-learning (AFQL) based on fuzzy inference systems (FIS) is proposed. The FIS realized by a normalized radial basis function (NRBF) neural network is used to approach Q-value function, whose input is composed of state and action. The rules of FIS are created incrementally according to the novelty of each element of the pair of state-action. Moreover the premise part and consequent part of the FIS are updated using extended Kalman filter (EKF). The action that impacts on environment is the one with maximum output of FIS in the current state and generated through optimization method. Simulation results in the wall-following task of mobile robots and the inverted pendulum balancing problem demonstrate that the superiority and applicability of the proposed AFQL method.

Key words:

Q-learning, FIS, continuous, adaptation